Document weight Query weight Top ten Scheme name
نویسندگان
چکیده
the goal in information retrieval is to enable users to automatically and accurately retrieve data relevant to their queries. One possible approach to this problem is to use the vector space model, which models documents and queries as vectors in the term space. The components of the vectors are determined by the term weighting scheme. This paper compared between a selected set from the available term weighting schemes to determine which weighting method is the best one to be used with Arabic data collections. Our results shows that the best method is the probabilistic inverse (IDFP) method; and we recommend using it as a global weighting method for Arabic data collections.
منابع مشابه
On Improving Pseudo-Relevance Feedback using an Absorbing Document
Pseudo-Relevance Feedback assumes that the top-ranked k documents of the initial retrieval are relevant, and then terms of these documents are used to re-weight the terms of the initial query (add new terms and/or change the weights of existing terms in the query). In this paper, we propose a new approach for query expansion for ad hoc search, by using an absorbing document which is the cross p...
متن کاملPseudo-Relevance Feedback Method based on the Cross Product of Irrelevant Documents
Pseudo-Relevance Feedback assumes that the top-ranked k documents of the initial retrieval are relevant, and then terms of these documents are used to re-weight the terms of the initial query (add new terms and/or change the weights of existing terms in the query). In this paper, we propose a new approach for query expansion for ad hoc search, by using an absorbing document which is the cross p...
متن کاملEffective Structured Query Formulation for Session Search
In this work, we emphasize on formulating effective structured queries for session search. For a given query, phrase-like text nuggets are identified and formulated into Lemur queries to feed into the Lemur search engine. Nuggets are substrings in qn, similar to phrases but not necessarily as semantically coherent as phrases. We assume that a valid nugget appears frequently in top returned snip...
متن کاملSJTU at TREC 2004: Web Track Experiments
Yiming Lu, Jian Hu, Fanyuan Ma ( Department of Computer Science & Engineering , S hanghai Jiaotong University , S hanghai 200030) {luyiniao , hujian , ma-fy}@sjtu.edu.cn Abstract: This is the first year our lab to participate in Trec. We participate in Mixed-Query task for the Web track. All the runs we submitted are based on the modified Okapi weighting scheme. Besides, we used several heurist...
متن کاملDocument Re-ordering Based on Key Terms in Top Retrieved Documents
In this paper, we propose a method to improve the precision of top retrieved documents by re-ordering the retrieved documents in the initial retrieval. To re-order the documents, we first automatically extract key terms from top N (N<=30) retrieved documents, then we collect key terms that occur in query and their document frequencies in top N retrieved documents, finally we use these collected...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2010